-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calibration and interpretation of log_scale parameter is poor #138
Comments
So the automatic rescaling finds the ratio of integrated flux between the data and the model, its roughly equivalent to factor = np.trapz(data, wl) / np.trapz(model, wl)
return model * factor I would recommend running this (or calling the As for the interpretability of the |
I'm wondering if there might be a units issue going on here. (Obviously) for the comparison both data and model should be in the same units. When dealing with non-flux calibrated spectra, it's easy to wrap any pre-factors for distance scale back into the normalization constant. This can also obscure more subtle points like the different sampling density if one spectrum is in counts/pixel and the other is in ergs/s/cm^2/A for example, and would also paper over any possible bugs in Starfish. We used flux-calibrated spectra for the 2015 paper, so in principle this is definitely possible. But that solution w.r.t. the distance scaling was fairly hacky, so hopefully we (Miles) managed to fix it with the emulator improvements. To dig into what's going on, I would hold off on full-sampling for the moment, and try to bound the problem with some debugging data
@iyera22 you are absolutely right that getting |
Sure, thank you!
Whether I include a log_scale parameter or not it uses either of the two code snippets I mentioned in my post above so I believe it is still scaling the mean flux calculated from the emulator to match the data first and then completing the reconstruction with the weights. I also think this is the same when implementing the sampling loop, I don't see a major difference overall. So if I use a log_scale, that is again calling the rescale function from transforms.py that is essentially scaling up the mean flux from the emulator or simply using a factor value to calculate the offset between the mean emulator flux and the data following the code snippet from Miles' post above. I wonder if the 'true' scaling value is recorded somewhere before everything gets normalized within the emulator? I think after that point the SpectrumModel object only refers to the eigenspectra and the mean flux that is coming out of the emulator and so perhaps that initial scaling from the raw model spectrum flux gets forgotten somewhere? Upon checking the emulator source code it is consistent in that the only outputs there are the eigenpectra, weights and the mean flux (+mean std). I was hoping maybe if this were modified to spit out the offset between the mean / normalized emulator interpolated spectrum from the original raw model flux from the grid (at the surface of the star) then I could add that to the log_scale parameter called within the Spectrum Model code when calculating the 'true offset' between the model and the data. |
to be clear, the flux = (weights @ eigenspectra) * flux_std + flux_mean so we can do some mental algebra and see that rescaling just the mean and std has the exact same effect as scaling the flux after reconstructing from the eigenspectra flux = factor * ((weights @ eigenspectra) * flux_std + flux_mean)
# is equivalent to
flux = (weights @ eigenspectra) * (flux_std * factor) + (flux_mean * factor) This happens every time inside the model evaluation (calling
The current model is basically set up as either "you fit |
Hi, thanks for the clarification,
so is log_scale here really log(R^2/D^2)? What I am looking for is the ratio between the non-normalized grid model flux (my original model grid + model spectra for interpolated parameter values before the normalization in the emulator occurs) Vs the flux calibrated data (observed star) as that scaling will be helpful to calculate (R/D)^2 to get constraints on the radius. Am I missing something?
I am somewhat set for now, as I'm currently following the ‘opposite approach’ where I un-dilute my flux calibrated data spectrum by a constant value so it matches the order of magnitude of my original model grid spectrum at the surface of the star and then perform the starfish sampling, not fitting for log_scale (and let it autoscale). This hacky way seems to work fine for a couple of targets I’ve tried and for the purposes of my work for now, but I hope there would be a more straightforward way of solving for the radius from the starfish fits in a future upgrade? Thanks! |
Okay, I think I've fully got my head wrapped around the problem- When you create an emulator from a grid it becomes unitless because of https://github.com/iancze/Starfish/blob/fceb6cc41ddec569c95e4157c8406c0b586f6140/Starfish/emulator/emulator.py#L286 I think the easiest way to resolve this would be to add an option to |
Based on my discussion with @iyera22 in a separate slack channel several days ago, I think the issue of This was initially suggested by @gully in his If we would like to keep this feature for the new Starfish version - Hope this helps. |
Thank you both! Yea that would be a helpful feature, although I'm wondering how significant the correlation can be if not normalized within the emulator reconstruction? How much would that affect the covariance matrix? |
Re: Miles' point about normalization: The PHOENIX spectral grid is provided in units of intensity at the stellar surface (if my memory serves me correctly) so there is intrinsically a very large change in amplitude from say 3200K to 7000K. While in theory you might be able to construct an emulator directly on the unnormalized spectra, it would make the PCA decomposition very inefficient. The largest variation in an unnormalized grid is due to the amplitude differences, so several PCA terms would need to be devoted to absorbing this amplitude change. That's the main reason why the each spectrum is normalized to 1 mean before PCA. One solution to this is to keep track of the amplitude term as a function of Teff, logg, Z, etc... as the emulator is being constructed, and create a smooth interpolator object for these normalizations. Then, when the emulator reconstructs spectra (which have mean of 1), the amplitude rescaling can be applied after the fact and bring the product to the correct units. I remember @gully and I had discussed this at some point, but I forget if we ever actually implemented it (@gully must have needed it for his LkCa 4 project), and how that factored into Miles' rewrite. Hence the confusing issue we now have! My apologies for loosing track of this! As it stands, it looks like the physical interpretation of log_scale is incorrect... it's merely a nuisance parameter. I think the suggestion by @zjzhang42 sounds like a reasonable approach to produce correct fluxes. In the end, though, this is probably only worth it if you are additionally fitting photometry simultaneously with the spectra via synthetic photometry. Spectrophotometric flux calibration uncertainties are in general worse than photometric calibration uncertainties. So, unless you are fitting to additional photometric points, I would think that the estimate of stellar radius that comes from fitting spectra alone will have a significant uncertainty attached to it. |
I agree with everything @iancze said! I already implemented the interpolation of flux scalars as a term https://github.com/gully/jammer-Gl570D And to reiterate what Ian said, the principal application for this feature is either if you have (accurate) flux calibrated spectra or photometry, you're simultaneously fitting photometry, or you're fitting a mixture model of two spectra and the relative fluxes matter as in Gully-Santiago et al. 2017.. |
In fact, Appendix A of Gully-Santiago et al. 2017 lays out the math for the implementation, see text starting with:
We tweaked the implementation since then to improve the accuracy for high dynamic range of temperatures. |
The interpretation of |
Thanks all, I'm trying to revisit this issue, I have a question for @gully, I'm trying to test out the implementation from your branch especially 'star_marley.py' (I hope I'm looking in the right place, I'm a bit lost) and I needed clarification on which Starfish version is needed to support this. I gather the emulator.py file that includes the interpolation for the flux_scalar value is in version 2.0? could you confirm? I was also wondering I don't see a function to train the emulator in that version, could I get more guidance on how to get started? What's a more efficient way to include log omega in the sampling with v.3.0? |
Specifically, what I'm looking for is something similar to the example implementation of brown dwarf low res data that you mentioned and that @zjzhang42 has demonstrated in his paper, by fitting for log_omega as an additional physical parameter. Is there a branch compatible with v3.0 that includes this? |
@iyera22 the functionality Gully mentioned is not compatible with v0.3 of Starfish (yet). |
Hi @iyera22, I spoke with @zjzhang42 and he plans to make his version of my version of Ian's version of Starfish public. Lol, that's a lot of versions... @mileslucas 's version (v0.3) is incompatible with my version, and by extension ZJ's version. It looks like Miles is working to rectify these two versions. It sounds like you should connect with ZJ to get the brown dwarf version working, or wait for Miles' extended version. Does that address your question? |
@gully @zjzhang42 that's great to hear. Perhaps it makes sense for me to migrate iancze/Starfish to a Github organization like Starfish-dev/Starfish, and give you all maintainer access? I think it would be really great for the long-term health of the package if we could centralize development around forks and pull requests back to a single repository. |
Maybe! I'd say let's wait for ZJ's version to appear so we can see what differences there are? Or maybe we're talking about other bigger/broader scope and experimentation? |
Thank you! That would be tremendously helpful! I look forward to the release of @zjzhang42 's version and this potential new flexible version. |
@gully I'm on board with the idea of splitting Starfish into smaller, more maintainable modules focused on specific use cases, not unlike the exoplanet-dev organization. I think the grid access, emulator, and likelihood are great examples to draw the line, and something like a starfish-dev organization would be a great way to centralize all of this. @mileslucas already did a lot of legwork to make everything more modular in v0.3, and I would envision this as building on that to its logical conclusion. This would also make it easier to rewrite, say, the likelihood component to be autodiff capable. But I think even before that, centralizing around a single Starfish-dev organization would help coordinate development efforts. |
Ah I understand the proposition now, making a GitHub organization. Yeah, sure I think that's a great idea. Loosely affiliated collections of packages can be stuck there too. And experimental new versions can be stuck there too. Also yes, autodiffable (or if someone has the guts to write out the analytical derivatives) is the way to go. One issue may be making the interpolation and resampling step autodiffable. I tried several times to figure this out and a few of those worked with different tradeoffs. But anyways that's a completely different issue unrelated to this one. |
I just notice that most of the Starfish scripts I used and modified are already in my branch: https://github.com/zjzhang42/Starfish/tree/ZJ_BD_v0 And I strongly agree that we should merge all these branches into a solid single package. To illustrate how to combine all these separate scripts to complete a spectral fitting, I could provide the yaml and bash scripts as examples. All these would probably come after July, as I am now overwhelmed by an upcoming PhD defense. |
I'm working with @zjzhang42 on forward analyses of brown dwarf atmospheres this summer and have run into problems related to this issue. It was suggested that I share some details here. I started by fitting PHOENIX models to M-dwarf spectra as a test case. Similar to the original issue raised @iyera22, I kept getting flatter-than-expected posteriors for any MCMC runs using Starfish v0.3. Specifically, the surface gravity and metallicity posteriors fill up the allowed parameter space and the global covariance hyper-parameters appear to be unconstrained. This behavior can be seen below in an example of the chains from a fit, where the global covariance parameters just wander. Besides providing bad fits, this often leads to problematic global covariance parameters that return errors (i.e. the covariance matrix is not positive-definite or nans or infs cannot be on the diagonal). Looking at the overall covariance matrix after optimizing the model shows that the flux uncertainties and global covariance are dominated by the emulator covariance matrix. Because the example spectrum included in the documentation provides good fits, I used this to test what could be going wrong. The initial fits provided nice, Gaussian distributions for each parameter and the total covariance matrix looks like the one below, where the flux uncertainties and global covariance are apparent on the diagonal. I noticed that the example spectrum is approximately on the same scale as the normalized emulator spectra, giving a At first, I thought that the emulator covariance was being scaled incorrectly, but tests show that it is scaled by m4gl213_with_fluxscalar_corner.pdf Perhaps I am completely wrong and the emulator covariance is supposed to be the primary source of uncertainty in our models. However, I am surprised that it would dominate so much given that the emulators for the PHOENIX spectra tend to be well-trained. Further, I do not understand why the emulator covariance matrix would grow so much larger than the flux uncertainties when the emulator model is being down-scaled to smaller values. I also think it's worth noting that after incorporating the |
Hi! a quick update on the emcee convergence issue. I have the following situation:
The problem however is that within the spectrum model code, the log_scale is scaling the normalized emulator interpolated spectrum (mean flux) to match the flux calibrated data flux, and so the best log_scale doesn't fall within physically realistic radius values.
Additionally, say for example if my raw model flux (computed at the surface of the star) is about 10-20 dex higher than the flux calibrated data, and I use an optimum log_scale that visually works well to match the two, then the emcee posteriors for all best fit parameters appeared flat. The same happens when I do not fit for log_scale and simply let it autoscale (the following code). I understand this huge scaling offset between data and model is contributing massively to the covariance matrix.
The only situation where I get well constrained posteriors for all parameters is when I scale up the flux calibrated data to roughly match the level of the stellar model flux at the surface of the star before performing the fit.
This is an example of the retrieved Teff, both when I fit for a visually good log_scale with appropriate priors and also when I do not fit for log_scale at all and let it autoscale.
This is an example of the same retrieved Teff, where I again fit for a visually good log_scale and also when I do not fit for log_scale at all, only here I've modified the input flux calibrated data to roughly match the level of the model flux at the surface of the star. Of course this is not the desired way to do this if I'm looking to properly solve for the radius of the object.
I'm wondering if this version of the code stores the flux scaling value somewhere before everything gets normalized inside the emulator? Maybe I could call that value and use it within the spectrum model code to specifically fit for a log((R/d)^2)? Thanks!
Originally posted by @iyera22 in #136 (comment)
The text was updated successfully, but these errors were encountered: